Realignment from Finer-grained Alignment to Coarser-grained Alignment to Enhance Mongolian-Chinese SMT

نویسندگان

  • Jing Wu
  • Hongxu Hou
  • Congjiao Xie
چکیده

The conventional Mongolian-Chinese statistical machine translation (SMT) model uses Mongolian words and Chinese words to practice the system. However, data sparsity, complex Mongolian morphology and Chinese word segmentation (CWS) errors lead to alignment errors and ambiguities. Some other works use finer-grained Mongolian stems and Chinese characters, which suffer from information loss when inducting translation rules. To tackle this, we proposed a method of using finer-grained Mongolian stems and Chinese characters for word alignment, but coarser-grained Mongolian words and Chinese words for translation rule induction (TRI) and decoding. We presented a heuristic technique to transform Chinese character-based alignment to word-based alignment. Experimentally, our method outperformed the baselines: fully finergrained and fully coarser-grained, in terms of alignment quality and translation performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Punctuations and Lengths for Bilingual Sub-sentential Alignment

We present a new approach to aligning bilingual English and Chinese text at sub-sentential level by interleaving alphabetic texts and punctuations matches. With sub-sentential alignment, we expect to improve the effectiveness of alignment at word, chunk and phrase levels and provide finer grained and more reusable translation memory.

متن کامل

Interleaving Text and Punctuations for Bilingual Sub-sentential Alignment

We present a new approach to aligning bilingual English and Chinese text at sub-sentential level by interleaving alphabetic texts and punctuations matches. With sub-sentential alignment, we expect to improve the effectiveness of alignment at word, chunk and phrase levels and provide finer grained and more reusable translation memory.

متن کامل

On the reliability and inter-annotator agreement of human semantic MT evaluation via HMEANT

We present analyses showing that HMEANT is a reliable, accurate and fine-grained semantic frame based human MT evaluation metric with high inter-annotator agreement (IAA) and correlation with human adequacy judgments, despite only requiring a minimal training of about 15 minutes for lay annotators. Previous work shows that the IAA on the semantic role labeling (SRL) subtask within HMEANT is ove...

متن کامل

Recrystallization texture during ECAP processing of ultrafine/nano grained magnesium alloy

An ultrafine/nano grained AZ31 magnesium alloy was produced through four-pass ECAP processing. TEM microscopy indicated that recrystallized regions included nano grains of 75 nm. Pole figures showed that a fiber basal texture with two-pole peaks was developed after four passes, where a basal pole peak lies parallel to the extrusion direction (ED) and the other ~20° away from the transverse dire...

متن کامل

Enhancing Statistical Machine Translation with Character Alignment

The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015